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Abstract 

The current deficiencies of virtual 
environment (VE) systems are well 
known; annoying lag time in drawing the 
current view, environments that are 
drastically simplified in an effort to 
reduce that lag time, low resolution and 
narrow field of view. The scripting of 
animations is an application of VE 
technology which can be carried out 
successfully despite these deficiencies. 
None of the deficiencies is present in the 
final product, a smoothly-moving high- 
resolution animation displaying detailed 
models. In this animation system, the 
user is represented in the VE by a 
human computer model with the same 
bodily proportions. Using magnetic 
tracking, the motions of the model’s 
upper torso, head and arms are 
controlled by the user’s movements (18 
DOF). The model’s lower torso and 
global position and orientation are 
controlled by a spaceball and keypad ( 1 2 
DOF). Using this system the human 
motion scripts can be extracted from the 
movements of a user while immersed in 
a simplified virtual environment. The 
recorded data is used to define key 
frames; motion is interpolated between 
them and post processing is done to add a 
more detailed environment. The result 
is a considerable savings in time and a 
much more natural-looking movement 
of a human figure in a smooth and 
seamless animation. 


1 .0 Introduction 

When composing animations portraying 
moving humans, a way of ensuring 
natural-looking movements is to 
capture motion from actual humans 
[1,2, 3, 4, 5]. Furthermore, placing the 
person whose movements are being 
captured in a mockup of the environment 
which is to be displayed allows 
registration of position and motion 
accurately with respect to that 
environment. We propose the use of a 
"soft" mockup or a virtual environment 
(VE) for this purpose. 

Human motion can be scripted by 
specifying individual joint angles or by 
specifying the goals of the motion and 
computing the joint angles with an 
inverse kinematics algorithm [2]. 
However, the motion produced by both of 
these methods tends to have an unnatural 
appearance [6,7,8]. Also, we have found 
that capturing actual motion takes 
considerably less time than specifying 
individual joint angles by interactively 
specifying movement goals, and produces 
more realistic motion. 

The current deficiencies of VE systems 
are well known. There are painful 
tradeoffs between resolution and field of 
view and between the time it takes to 
draw the current view and the 
complexity of the virtual environment 
[9,1 0]. Typically one must settle for an 
unnaturally narrow field of view and a 
simplified, cartoon-like visual 
environment. Because the environment 
in which the motion is captured need 
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only be an approximation of the 
environment which appears in the final 
animation, these deficiencies are not a 
serious hindrance for scripting 
animations. 

2.0 Background 

The Graphics Research and Analysis 
Facility (GRAF) at the Johnson Space 
Center, Houston, the authors research 
human modeling as it relates to the 
human factoring of man-in-the-loop 
systems. Animations involving human 
movement are of particular interest for 
optimizing human performance and for 
checking consistency and continuity of 
task designs[11]. Heretofore, the 
composition of animations involving 
human movement has been a painstaking 
operation in which a user at an 
interactive workstation specifies each 
movement of each joint. The method of 
scripting described in this paper results 
in a considerable savings of time and 
produces more natural-looking human 
movements in an animation. 

3.0 Description of the system 

3.1 Tracking and Computing the 
Human Motion. 

The first phase involves the capture of 
the tracking information from actual 
human motion and the computation and 
display of the resultant motion of the 
human model within the VE. In order to 
insure that the models movements are 
accurate and that its joint angles mimic 
those of the user, it is necessary for the 
figure's major anthropometric 
measurements to be the same as those of 
the user. 

The user wears a head-mounted display 
(HMD) slaved to the viewpoint by means 
of a magnetic tracker. The user is 
personified in the VE as a human model 
figure with the viewpoint at the figure's 
eye sites. A total of four trackers 
suffices to mimic upper-body motion 
(16 DOF) [1,2,3]; the trackers are 
positioned on the head, wrists and upper 


back. The upper-body joint angles are 
computed with an inverse kinematics 
(IK) algorithm[6,7,8]. Wrist 
radial/ulnar deviation is omitted, 
leaving only 6 DOF for the arm and 
shoulder making their joint angle 
computations deterministic; hence the 
joint angles are rapidly computed and 
for most motions are constrained to 
match those of the user. The shoulder 
complex motion is ignored leading to 
some error in the motion. Inclusion of 
the complex clavicle and scapular 
motion would make the inverse- 
kinematic computation non- 
deterministic and difficult to control 
with one tracker. It is important to note 
that, in this phase, a simplified VE is 
sufficient, as long as it contains the 
visual cues needed for the motion. 

The software system is divided into two 
drawing servers, one reach server, and 
one magnetic tracking server (See 
Figure 1). The main client retrieves 
the current state of the user from the 
tracking server, polls the spacebail for 
translation and rotation information, 
and merges the spacebail information 
with the tracker information. This 
information is passed to the reach 
server which computes the resulting 
motion in terms of changes in joint 
angles[12]. The reach server 
computation is done in a software 
package called Jack initiated under a 
NASA university grant by our 
laboratory at the University of 
Pennsylvania [6]. The changes in the 
position and orientation of the figure as 
well as the joint angle changes of the 
body are relayed to the drawing servers 
which update the environment and pipe 
the needed stereo views to the head 
mounted display. The advantages of this 
distributed design is not only speed, but 
also that any server could reside on any 
machine on the internet (e.g. tracking 
information could come from another 
facility ). 

The position and orientation of the figure 
can be controlled by an operator using a 
six-degree-of-freedom spacebail. Each 
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magnetic tracker matrix is first 
converted to the coordinate system of the 
figure (at the base of feet). The 
spaceball information (relative mode 
translation and rotation pulses) is 
accumulated and applied to each of the 
magnetic tracker matrices in the figure 
coordinate system. The composite 
matrices are converted back to global 
coordinate system to be presented to the 
inverse kinematic reach server. The 
scheme allows the figure to be moved by 
the operator using the spaceball in a 
natural manner (with respect to the 
figures coordinate system) while the 
motions of the user are applied to the 
human models new translated and rotated 
coordinate system. The joint angles of 
the lower limbs can be changed by the 
operator using the buttons on the 
spaceball device[1]. 


3.2 Scripting the Animation. 

Scripting the animation involves 
processing of the captured human motion 
sequences to produce the key frames of 
the animation. It requires two people to 
use the system. The first is the actual 
personified user with the magnetic 
trackers appropriately positioned on the 
body. The second is the operator who 
will control the position and orientation 
of the figure in the VE based on the 
user’s requests. The operator will also 
command the system to write key frames 
of the animation at appropriate times. 
The issue of producing an animation that 
has a realistic time-line is still being 
researched. 

The operator initiates the session by 
bringing the user to within reaching 
distance of the specific work 
environment. The user then performs 
the activity as prescribed by the task 
plan. At the operator's signal, the 
system records the state of every 
moveable part. The user tells the 
operator where and how to orient the 
figure. Upon completion of the session, 
a file of human motions is produced. 
These recorded data are used to define 


key frames; post processing software 
interpolates motion between the key 
frames to produce a smooth animation. 

3.3 Producing the High 
Resolution Animation. 

The recording of the scripting is done in 
a simplified VE. Because the post 
processing is not time-critical, it can 
use more complex models supplying 
details that were missing in the VE. The 
simplified human model is replaced with 
a high-resolution model and the 
environment is made much more 
detailed. The keyfile is then replayed 
into the animation frame generation 
program which interpolates between all 
the key frames. It is also possible to do 
other special post processing which 
include texture mapping and realistic 
lighting (see the section on future work 
below) (Figure 2). 

4.0 Discussion 

A narrowed field of view can affect 
distance judgments adversely [13,14]; 
however, we found that, within the 
extent of human reach, it was not 
difficult to make sufficiently accurate 
movements. Also, knowing the relative 
size of objects (i.e. size of hand relative 
to a workstation screen, for instance) 
and knowing the approximate location of 
at least one (your hand) seemed to 
increase the knowledge of relative 
distances. One reason may be that 
stereopsis is a useful distance cue with a 
person's reach extent [10]. 

It can be argued that a helmet mounted 
display is not needed to script the human 
animations. Scripting an animation 
using two global views of the human 
with the user and the operator working 
the system was tried. When the user 
tried to view what was being displayed 
on the monitors, it changed the motion of 
the human model. There exists an 
“animation uncertainty principle”. 
That is, the item being measured (the 
human being) changes as soon as one 
tries to see one's own changes on a 
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display monitor. In order for a natural 
looking animation, the user needs to see 
what they are looking at and working 
with. It is believed that the more 
immersed an individual is into the 
environment, the more realistic the 
motions will appear. A helmet mounted 
display provides some of that 
functionality with some severe 
limitations. 

The user's left and right-eye views can 
be seen by the spaceball operator on 
monitors; however, they are not 
particularly convenient to use when 
repositioning or reorienting the VE. 
Hence, a third view is needed which 
would give the spaceball operator an 
overview of the action; ideally, the 
operator should be able to move this 
viewpoint. 

The dramatic effect of realistic motion 
was caused by very subtle motions. 
When the user turned her head, there 
would be slight motions of the waist, and 
hands. These motions would be very 
difficult to reproduce manually. When 
the user looked up, the back would arch 
by a few degrees and the elbows might 
swing back. 

The spaceball offered a very distinct 
advantage. The user could stay 
relatively close to the magnetic tracker 
source (this is needed for accuracy) and 
still be “virtually” moved to any 
location with any orientation within the 
virtual environment. Moreover, 
because the HMD and the magnetic 
trackers have many cables, the user was 
also safer to stay seated on a chair just 
moving the head, torso and arms. 

With more trackers, we could capture 
lower body motion also. Walking while 
tethered with an HMD and magnetic 
trackers presents some obvious 
problems. (Perhaps it is fortunate that 
one does not walk in microgravity.) 


5.0 Conclusion 

A virtual environment can provide a 
rapid and convenient way of capturing 
human motion sequences. Immersion in 
the virtual environment allows the user 
to be positioned correctly relative to the 
environment and to perform accurate 
reaching movements. A simplified VE 
can be used to give an adequate display 
rate for capturing the motion and then 
replaced by a more detailed environment 
when the captured motion is used to 
generate an animation. Other post 
processing can provide additional special 
effects in the finished product, a smooth 
and seamless animation. 

6.0 Future Work 

Several extensions of this work are 
planned for the future. 

We intend to allow the figure and user to 
have different bodily dimensions; thus, 
for instance, we will be able to script 
movements for the 5th and 95th 
percentile individuals so beloved of 
human factors engineers. 

A right-handed CyberGlove has already 
been incorporated into the system. The 
CyberGlove senses the motions of the 
joints of the hand (18 DOF). It gives 
2DOF for the wrist, supplying the 
missing wrist radial/ulnar deviation 
and leaving only 5DOF for the arm and 
shoulder IK algorithm. Once a left 
-handed glove is acquired, animations 
involving both hands will be done. 

There is no limit to the amount to the 
post-processing that can be done once 
the motion is captured. For instance, 
the Radiance algorithm is used in the 
GRAF to do realistic light computations 
[15]; we would like to use it to provide 
realistic lighting for the animations. 
Additional texture maps, or more 
detailed texture maps, can also be used. 
If needed, a texture map based recursive 
animation (animation inside an 
animation) could be created to reflect, 
for instance, changing views on a 
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monitor of the Space Shuttle cargo bay 
operation. This animation could be 
displayed with texture maps on a 
monitor within the environment. 

Collision detection would be a real 
convenience in the VE to ensure that the 
reaches are accurate. Collision detection 
is computationally expensive, but even a 
restricted form of it would be useful in 
the detection of the intersection of one 
point at the end of the user's extended 
finger with any of a set of "reachable" 
objects [1 6]. 

It is possible to record the animation 
with a viewpoint different from the 
user’s, or with a different field of view. 
One possibility is to allow the viewpoint 
to move and to specify its position 
interactively as the animation frames 
are produced. 

Two viewpoints from the recorded data 
could be reconstructed and used to make 
a stereo presentation of the animation 
that could be viewed with the HMD. 
Synchronization of the two images 
requires some special measures. 

Finally, as soon as we acquire more 
trackers, we intend to put a second user 
into a VE. 
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Figure 1- Software/Hardware System Configuration (All servers and clients run on 
Silicon Graphics Workstations). 
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Figure 2. High resolution human model working at a space station workstation. 
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